Functional derivative

In mathematics and theoretical physics, the functional derivative is a generalization of the gradient. While the latter differentiates with respect to a vector with discrete components, the former differentiates with respect to a continuous function. Both of these can be viewed as extensions of the simple one-dimensional derivative in usual calculus. The mathematically formal treatment is the subject of functional analysis.

Contents

Definition

Given a manifold M representing (continuous/smooth/with certain boundary conditions/etc.) functions φ and a functional F defined as

F\colon M \rightarrow \mathbb{R} \quad \mbox{or} \quad F\colon M \rightarrow \mathbb{C} ,

the functional derivative of F, denoted {\delta F}/{\delta\varphi}, is a distribution such that for all test functions f

\left\langle \frac{\delta F[\varphi(x)]}{\delta\varphi(x)}, f(x) \right\rangle = \int \frac{\delta F[\varphi(x)]}{\delta\varphi(x')} f(x')dx' = \lim_{\varepsilon\to 0}\frac{F[\varphi(x)%2B\varepsilon f(x)]-F[\varphi(x)]}{\varepsilon} = \left.\frac{d}{d\epsilon}F[\varphi%2B\epsilon f]\right|_{\epsilon=0}.

Using the first variation of \varphi, \delta\varphi, in place of f yields the first variation of F, \delta F; this is similar to how the differential is obtained from the gradient. Using a function f with unit norm yields the directional derivative along that function.

In physics, it's common to use the Dirac delta function \delta(x-y) in place of a generic test function f(x), for yielding the functional derivative at the point y (this is a point of the whole functional derivative as a partial derivative is a component of the gradient):

\frac{\delta F[\varphi(x)]}{\delta \varphi(y)}=\lim_{\varepsilon\to 0}\frac{F[\varphi(x)%2B\varepsilon\delta(x-y)]-F[\varphi(x)]}{\varepsilon}.

This works in cases when F[\varphi(x)%2B\varepsilon f(x)] formally can be expanded as a series (or at least up to first order) in \varepsilon. The formula is however not mathematically rigorous, since F[\varphi(x)%2B\varepsilon\delta(x-y)] is usually not even defined.

Formal description

The definition of a functional derivative may be made more mathematically precise and formal by defining the space of functions more carefully. For example, when the space of functions is a Banach space, the functional derivative becomes known as the Fréchet derivative, while one uses the Gâteaux derivative on more general locally convex spaces. Note that the well-known Hilbert spaces are special cases of Banach spaces. The more formal treatment allows many theorems from ordinary calculus and analysis to be generalized to corresponding theorems in functional analysis, as well as numerous new theorems to be stated.

Using the delta function as a test function

The definition given above is based on a relationship that holds for all test functions f, so one might think that it should hold also when f is chosen to be a specific function as the delta function. However, the latter is not a valid test function.

In the definition, the functional derivative describes how the functional F[\varphi(x)] changes as a result of a small change in the entire function \varphi(x). The particular form of the change in \varphi(x) is not specified, but it should stretch over the whole interval on which x is defined. Employing the particular form of the perturbation given by the delta function has the meaning that \varphi(x) is varied only in the point y. Except for this point, there is no variation in \varphi(x).

Often a physicist wants to know how one quantity, say the electric potential V at position r_1, is affected by changing another quantity, say the density of electric charge \rho at position r_2. The potential at a given position is a functional of the density, that is, given a particular density function and a point in space, one can compute a number which represents the potential of that point in space due to the specified density function. Since we are interested in how this number varies across all points in space, we treat the potential as a function of r. To wit,

V(r) = F[\rho] = \frac{1}{4\pi\epsilon_0} \int \frac{\rho(r')}{|r-r'|} \mathrm{d}r'.

That is, for each r, the potential V(r) is a functional of \rho(r'). Applying the definition of functional derivative,


\begin{align}
\left\langle \frac{\delta F[\rho]}{\delta \rho(r')}, f(r') \right\rangle
& {} = \frac{d}{d\varepsilon} \left. \frac{1}{4\pi\epsilon_0} \int \frac{\rho(r') %2B \varepsilon f(r')}{|r-r'|} \mathrm{d}r' \right|_{\varepsilon=0} \\
& {} = \frac{1}{4\pi\epsilon_0} \int \frac{f(r')}{|r-r'|} \mathrm{d}r' \\
& {} = \left\langle \frac{1}{4\pi\epsilon_0|r-r'|}, f(r') \right\rangle.
\end{align}

So,


\frac{\delta V(r)}{\delta \rho(r')} = \frac{1}{4\pi\epsilon_0|r-r'|}.

Now we can evaluate the functional derivative at r = r_1 and r' = r_2 to see how the potential at r_1 is changed due to a small variation in the density at r_2, but in general the unevaluated form is probably more useful.

Examples

We give a formula to derive a common class of functionals that can be written as the integral of a function and its derivatives. This is a generalization of the Euler–Lagrange equation: indeed, the functional derivative was introduced in physics within the derivation of the Lagrange equation of the second kind from the principle of least action in Lagrangian mechanics (18th century). The first three examples below are taken from density functional theory (20th century), the fourth from statistical mechanics (19th century).

Formula for the integral of a function and its derivatives

Given a functional of the form

F[\rho(\mathbf{r})] = \int f( \mathbf{r}, \rho(\mathbf{r}), \nabla\rho(\mathbf{r}) )\, d\mathbf{r},

with \rho vanishing at the boundaries of \mathbf{r}, the scalar product of the functional derivative with a function \phi can be written


\begin{align}
\left\langle \frac{\delta F[\rho]}{\delta\rho}, \phi \right\rangle 
& {} = \frac{d}{d\varepsilon} \left. \int f( \mathbf{r}, \rho %2B \varepsilon \phi, \nabla\rho%2B\varepsilon\nabla\phi )\, d\mathbf{r} \right|_{\varepsilon=0} \\
& {} = \int \left( \frac{\partial f}{\partial\rho} \phi %2B \frac{\partial f}{\partial\nabla\rho} \cdot \nabla\phi \right) d\mathbf{r} \\
& {} = \int \left[ \frac{\partial f}{\partial\rho} \phi %2B \nabla \cdot \left( \frac{\partial f}{\partial\nabla\rho} \phi \right) - \left( \nabla \cdot \frac{\partial f}{\partial\nabla\rho} \right) \phi \right] d\mathbf{r} \\
& {} = \int \left[ \frac{\partial f}{\partial\rho} \phi - \left( \nabla \cdot \frac{\partial f}{\partial\nabla\rho} \right) \phi \right] d\mathbf{r} \\
& {} = \left\langle \frac{\partial f}{\partial\rho} - \nabla \cdot \frac{\partial f}{\partial\nabla\rho}\,, \phi \right\rangle,
\end{align}

where, in the third line, \phi=0 is assumed at the integration boundaries. Thus the functional derivative is


\frac{\delta F[\rho]}{\delta\rho} = \frac{\partial f}{\partial\rho} - \nabla \cdot \frac{\partial f}{\partial\nabla\rho}

or, writing the expression more explicitly,


\frac{\delta F[\rho(\mathbf{r})]}{\delta\rho(\mathbf{r})} = \frac{\partial}{\partial\rho(\mathbf{r})}f(\mathbf{r}, \rho(\mathbf{r}), \nabla\rho(\mathbf{r})) - \nabla \cdot \frac{\partial}{\partial\nabla\rho(\mathbf{r})}f(\mathbf{r}, \rho(\mathbf{r}), \nabla\rho(\mathbf{r}))

The above example is specific to the particular case that the functional depends on the function \rho(\mathbf{r}) and its gradient \nabla\rho(\mathbf{r}) only. In the more general case that the functional depends on higher order derivatives, i.e.


F[\rho(\mathbf{r})] = \int f( \mathbf{r}, \rho(\mathbf{r}), \nabla\rho(\mathbf{r}), \nabla^2\rho(\mathbf{r}), \dots, \nabla^N\rho(\mathbf{r}))\, d\mathbf{r},

where \nabla^i is a tensor whose n^i components (\mathbf{r} \in \mathbb{R}^n) are all partial derivative operators of order i, i.e. \partial^i/(\partial r^{i_1}_1\, \partial r^{i_2}_2 \dots \partial r^{i_n}_n) with i_1%2Bi_2%2B\cdots%2Bi_n = i, an analogous application of the definition yields


\begin{align}
\frac{\delta F[\rho]}{\delta \rho} &{} = \frac{\partial f}{\partial\rho} - \nabla \cdot \frac{\partial f}{\partial(\nabla\rho)} %2B \nabla^2 \cdot \frac{\partial f}{\partial\left(\nabla^2\rho\right)} %2B \dots %2B (-1)^N \nabla^N \cdot \frac{\partial f}{\partial\left(\nabla^N\rho\right)} \\
&{} = \sum_{i=0}^N (-1)^{i}\nabla^i \cdot \frac{\partial f}{\partial\left(\nabla^i\rho\right)}.
\end{align}

Thomas-Fermi kinetic energy functional

The Thomas-Fermi model of 1927 used a kinetic energy functional for a noninteracting uniform electron gas in a first attempt of density-functional theory of electronic structure:

T_\mathrm{TF}[\rho] = C_\mathrm{F} \int \rho^{5/3}(\mathbf{r}) \, d\mathbf{r}.

T_\mathrm{TF}[\rho] depends only on the charge density \rho(\mathbf{r}) and does not depend on its gradient, Laplacian, or other higher-order derivatives (functionals like this are called “local”). Therefore,

\frac{\delta T_\mathrm{TF}[\rho]}{\delta \rho} = C_\mathrm{F} \frac{\partial \rho^{5/3}(\mathbf{r})}{\partial \rho(\mathbf{r})}  = \frac{5}{3} C_\mathrm{F}  \rho^{2/3}(\mathbf{r}).

Coulomb potential energy functional

For the classical part of the potential, Thomas and Fermi employed the Coulomb potential energy functional

J[\rho] = \frac{1}{2}\iint \frac{\rho(\mathbf{r}) \rho(\mathbf{r}')}{\vert \mathbf{r}-\mathbf{r}' \vert}\, d\mathbf{r} d\mathbf{r}' = \int \left(\frac{1}{2}\int \frac{\rho(\mathbf{r}) \rho(\mathbf{r}')}{\vert \mathbf{r}-\mathbf{r}' \vert} d\mathbf{r}'\right) d\mathbf{r} = \int j[\mathbf{r},\rho(\mathbf{r})]\, d\mathbf{r}.

Again, J[\rho] depends only on the charge density \rho and does not depend on its gradient, Laplacian, or other higher-order derivatives (i.e., it is a “local” functional). Therefore,

\frac{\delta J[\rho]}{\delta \rho(\mathbf{r})} = \frac{\partial j}{\partial \rho(\mathbf{r})} = \frac{1}{2}\int \frac{\partial}{\partial \rho(\mathbf{r})}\frac{\rho(\mathbf{r}) \rho(\mathbf{r}')}{\vert \mathbf{r}-\mathbf{r}' \vert}\, d\mathbf{r}' = \int \frac{\rho(\mathbf{r}')}{\vert \mathbf{r}-\mathbf{r}' \vert}\, d\mathbf{r}'

The second functional derivative of the Coulomb potential energy functional is

\frac{\delta^2 J[\rho]}{\delta \rho(\mathbf{r}')\delta\rho(\mathbf{r})}  = \frac{\partial}{\partial \rho(\mathbf{r}')} \frac{\rho(\mathbf{r}')}{\vert \mathbf{r}-\mathbf{r}' \vert} = \frac{1}{\vert \mathbf{r}-\mathbf{r}' \vert}.

Weizsäcker kinetic energy functional

In 1935 von Weizsäcker proposed to add a gradient correction to the Thomas-Fermi kinetic energy functional to make it suit better a molecular electron cloud:

T_\mathrm{W}[\rho] = \frac{1}{8} \int \frac{\nabla\rho(\mathbf{r}) \cdot \nabla\rho(\mathbf{r})}{ \rho(\mathbf{r}) } d\mathbf{r} = \frac{1}{8} \int \frac{(\nabla\rho(\mathbf{r}))^2}{\rho(\mathbf{r})}\, d\mathbf{r} = \int t[\rho(\mathbf{r}),\nabla\rho(\mathbf{r})] d\mathbf{r}.

Now T_\mathrm{W}[\rho] depends on the charge density \rho and its gradient \nabla \rho, therefore

\frac{\delta T_\mathrm{W}[\rho]}{\delta \rho} = \frac{\partial t}{\partial \rho} - \nabla\cdot\frac{\partial t}{\partial (\nabla \rho)}  = -\frac{1}{8}\frac{(\nabla\rho(\mathbf{r}))^2}{\rho(\mathbf{r})^2} - \nabla\cdot\left(\frac{1}{4}\frac{\nabla\rho(\mathbf{r})}{\rho(\mathbf{r})}\right) =  \frac{1}{8}  \frac{(\nabla\rho(\mathbf{r}))^2}{\rho(\mathbf{r})^2} - \frac{1}{4}\frac{\nabla^2\rho(\mathbf{r})}{\rho(\mathbf{r})}.

Writing a function as a functional

Finally, note that any function can be written in terms of an integral functional. For example,

\rho(\mathbf{r}) = \int \rho(\mathbf{r}') \delta(\mathbf{r}-\mathbf{r}')\, d\mathbf{r}'.

This functional depends on \rho only, as the first two examples above (i.e., they are all “local”). Therefore,

\frac{\delta \rho(\mathbf{r})}{\delta\rho(\mathbf{r}')}=\frac{\partial \rho(\mathbf{r}') \delta(\mathbf{r}-\mathbf{r}')}{\partial \rho(\mathbf{r}')} = \delta(\mathbf{r}-\mathbf{r}').

Entropy

The entropy of a discrete random variable is a functional of the probability mass function.


\begin{align}
H[p(x)] = -\sum_x p(x) \log p(x)
\end{align}

Thus,


\begin{align}
\left\langle \frac{\delta H}{\delta p}, \phi \right\rangle 
& {} = \sum_x \frac{\delta H[p(x)]}{\delta p(x')} \, \phi(x') \\
& {} = \left. \frac{d}{d\epsilon} H[p(x) %2B \epsilon\phi(x)] \right|_{\epsilon=0}\\
& {} = -\frac{d}{d\varepsilon} \left. \sum_x [p(x) %2B \varepsilon\phi(x)] \log [p(x) %2B \varepsilon\phi(x)] \right|_{\varepsilon=0} \\
& {} = \displaystyle -\sum_x [1%2B\log p(x)]\phi(x)\\
& {} = \left\langle -[1%2B\log p(x)], \phi \right\rangle.
\end{align}

Thus,


\frac{\delta H}{\delta p} = -[1%2B\log p(x)].

Exponential

Let

 F[\varphi(x)]= e^{\int \varphi(x) g(x)dx}.

Using the delta function as a test function,


\begin{align}
\frac{\delta F[\varphi(x)]}{\delta \varphi(y)} 
& {} = \lim_{\varepsilon\to 0}\frac{F[\varphi(x)%2B\varepsilon\delta(x-y)]-F[\varphi(x)]}{\varepsilon}\\
& {} = \lim_{\varepsilon\to 0}\frac{e^{\int (\varphi(x)%2B\varepsilon\delta(x-y)) g(x)dx}-e^{\int \varphi(x) g(x)dx}}{\varepsilon}\\
& {} = e^{\int \varphi(x) g(x)dx}\lim_{\varepsilon\to 0}\frac{e^{\varepsilon \int \delta(x-y) g(x)dx}-1}{\varepsilon}\\
& {} = e^{\int \varphi(x) g(x)dx}\lim_{\varepsilon\to 0}\frac{e^{\varepsilon g(y)}-1}{\varepsilon}\\
& {} = e^{\int \varphi(x) g(x)dx}g(y).
\end{align}

Thus,

 \frac{\delta F[\varphi(x)]}{\delta \varphi(y)} = g(y) F[\varphi(x)].

References